NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models

Chu, Tianzhe Chu; Tong, Shengbang; Ding, Tianjiao; Dai, Xili; Haeffele, Benjamin; Vidal, René; Ma, Yi (May 2024, International Conference on Learning Representations (ICLR))

The advent of large pre-trained models has brought about a paradigm shift in both visual representation learning and natural language processing. However, clustering unlabeled images, as a fundamental and classic machine learning problem, still lacks an effective solution, particularly for large-scale datasets. In this paper, we propose a novel image clustering pipeline that leverages the powerful feature representation of large pre-trained models such as CLIP and cluster images effectively and efficiently at scale. We first developed a novel algorithm to estimate the number of clusters in a given dataset. We then show that the pre-trained features are significantly more structured by further optimizing the rate reduction objective. The resulting features may significantly improve the clustering accuracy, e.g., from 57\% to 66\% on ImageNet-1k. Furthermore, by leveraging CLIP's multimodality bridge between image and text, we develop a simple yet effective self-labeling algorithm that produces meaningful captions for the clusters. Through extensive experiments, we show that our pipeline works well on standard datasets such as CIFAR-10, CIFAR-100, and ImageNet-1k. It also extends to datasets that are not curated for clustering, such as LAION-Aesthetics and WikiArts. We released the code in https://github.com/LeslieTrue/CPP
more » « less
Full Text Available
Unsupervised Manifold Linearizing and Clustering

Ding, Tianjiao; Tong, Shengbang; Chan, Ryan; Dai, Xili; Ma, Yi; Haeffele, Benjamin (October 2023, International Conference on Computer Vision)

We consider the problem of simultaneously clustering and learning a linear representation of data lying close to a union of low-dimensional manifolds, a fundamental task in machine learning and computer vision. When the manifolds are assumed to be linear subspaces, this reduces to the classical problem of subspace clustering, which has been studied extensively over the past two decades. Unfortunately, many real-world datasets such as natural images can not be well approximated by linear subspaces. On the other hand, numerous works have attempted to learn an appropriate transformation of the data, such that data is mapped from a union of general non-linear manifolds to a union of linear subspaces (with points from the same manifold being mapped to the same subspace). However, many existing works have limitations such as assuming knowledge of the membership of samples to clusters, requiring high sampling density, or being shown theoretically to learn trivial representations. In this paper, we propose to optimize the Maximal Coding Rate Reduction metric with respect to both the data representation and a novel doubly stochastic cluster membership, inspired by state-of-the-art subspace clustering results. We give a parameterization of such a representation and membership, allowing efficient mini-batching and one-shot initialization. Experiments on CIFAR-10, -20, -100, and TinyImageNet-200 datasets show that the proposed method is much more accurate and scalable than state-of-the-art deep clustering methods, and further learns a latent linear representation of the data.
more » « less
Full Text Available
Incremental Learning of Structured Memory via Closed-Loop Transcription

Tong, Shengbang; Dai, Xili; Wu, Ziyang; Li, Mingyang; Yi, Brent; Ma, Yi (June 2023, International Conference on Learning Representations)

This work proposes a minimal computational model for learning structured memories of multiple object classes in an incremental setting. Our approach is based on establishing a closed-loop transcription between the classes and a corresponding set of subspaces, known as a linear discriminative representation, in a lowdimensional feature space. Our method is simpler than existing approaches for incremental learning, and more efficient in terms of model size, storage, and computation: it requires only a single, fixed-capacity autoencoding network with a feature space that is used for both discriminative and generative purposes. Network parameters are optimized simultaneously without architectural manipulations, by solving a constrained minimax game between the encoding and decoding maps over a single rate reduction-based objective. Experimental results show that our method can effectively alleviate catastrophic forgetting, achieving significantly better performance than prior work of generative replay on MNIST, CIFAR-10, and ImageNet-50, despite requiring fewer resources. Source code can be found at https://github.com/tsb0601/i-CTRL
more » « less
Full Text Available
Revisiting Sparse Convolutional Model for Visual Recognition

Dai, Xili; Li, Mingyang; Zhai, Pengyuan; Tong, Shengbang; Gao, Xingjian; Huang, Shao-Lun; Zhu, Zhihui; You, Chong; Ma, Yi (January 2022, Advances in neural information processing systems)

Full Text Available
Revisiting Sparse Convolutional Model for Visual Recognition

Dai, Xili; Li, Mingyang; Zhai, Pengyuan; Tong, Shengbang; Gao, Xingjian; Huang, Shao-Lun; Zhu, Zhihui; You, Chong; Ma, Yi (January 2022, Advances in Neural Information Processing Systems)

Full Text Available
CTRL: Closed-Loop Transcription to an LDR via Minimaxing Rate Reduction

https://doi.org/10.3390/e24040456

Dai, Xili; Tong, Shengbang; Li, Mingyang; Wu, Ziyang; Psenka, Michael; Chan, Kwan Ho; Zhai, Pengyuan; Yu, Yaodong; Yuan, Xiaojun; Shum, Heung-Yeung; et al (April 2022, Entropy)

This work proposes a new computational framework for learning a structured generative model for real-world datasets. In particular, we propose to learn a Closed-loop Transcriptionbetween a multi-class, multi-dimensional data distribution and a Linear discriminative representation (CTRL) in the feature space that consists of multiple independent multi-dimensional linear subspaces. In particular, we argue that the optimal encoding and decoding mappings sought can be formulated as a two-player minimax game between the encoder and decoderfor the learned representation. A natural utility function for this game is the so-called rate reduction, a simple information-theoretic measure for distances between mixtures of subspace-like Gaussians in the feature space. Our formulation draws inspiration from closed-loop error feedback from control systems and avoids expensive evaluating and minimizing of approximated distances between arbitrary distributions in either the data space or the feature space. To a large extent, this new formulation unifies the concepts and benefits of Auto-Encoding and GAN and naturally extends them to the settings of learning a both discriminative and generative representation for multi-class and multi-dimensional real-world data. Our extensive experiments on many benchmark imagery datasets demonstrate tremendous potential of this new closed-loop formulation: under fair comparison, visual quality of the learned decoder and classification performance of the encoder is competitive and arguably better than existing methods based on GAN, VAE, or a combination of both. Unlike existing generative models, the so-learned features of the multiple classes are structured instead of hidden: different classes are explicitly mapped onto corresponding independent principal subspaces in the feature space, and diverse visual attributes within each class are modeled by the independent principal components within each subspace.
more » « less
Full Text Available

Search for: All records